Special Note: While this R Markdown files serves as the source for all of the materials found in this project’s writeup, this file has a slightly different purpose and will not follow the paper’s order. Essentially, I want users to be able to follow this file as if it were a tutorial. As such, most of the code will be viewable, I will create more data frames than necessary so the user can inspect differences, and the order of creation will be different to similar visualizations together. To see the intended narrative and order for story purposes, read the writeup.
The purpose of this analysis was to examine the spatial characteristics of Chicago crime data and to explore relationships between the density of that crime, proximity to locations of interest (Police and Transit Stations), and community demographics in order to inform further investment in police infrastructure. We also examined some of the temporal characteristics of crime to inform resource management/deployment. Our initial hypothesis was as follows:
The data from 2013-01-01 through 2014-12-31 actually indicated that crime rates do not decrease but actually increase as one moves closer to a police station, causing us to reject part 1 of the hypothesis. However, crime rates do increase exponentially as one approaches a transit station, supporting section two of our initial hypothesis. Our final recommendation as it concerns police infrastructure can be found in the conclusion.
The “client” has requested analytic support to determine how to best invest in new police infrastructure. The purpose of this analysis is to determine significant spatial, temporal, and demographic crime patterns that could be used to guide investments and/or improve policing patterns. We will examine Chicago Crime Data from several angles and utilize multiple visual methods including the following:
Our initial, spatial hypothesis:
Assuming both parts of the hypothesis prove true and the spatial relationships are strong, we will likely suggest building small, permanent outposts in the city’s most crime dense communities, at their most affected transit stations. Such a strategy could reduce crime rates and place our force in the most best position to affect crime in their communities.
Here are links to a few of the spatial and graphical packages used herein:
The superset of data is a combined set of files including Chicago crime data, geo-spatial polygons, Mass transit station locations, police department locations, demographics, Categorization tables, weather data, and more. All requisite data files can be found in the appropriate folder on Github. While the crime data currently covers dates from 2013-01-01 to 2014-12-31, this could be easily updated at will. Currently, I have intentionally left 2015 data out, as I am using it as test data for a separate analysis.
Base crime data, shape files, coordinate files, and census data comes from the data repository at the City of Chicago website. Although not used in this iteration of analysis, weather data was downloaded from the Weather Underground. This analysis also uses a community-census tract mapping table courtesy of Rob Paral. The base sources are then subsequently scrubbed, reformatted, joined, etc. in R. This can be viewed via the RMarkdown file “CreateSuperDataSet.”
Each tuple in the base crime dataset equates to 1 crime. While the base crime data is robust, this analysis required a decent amount of “munging,” mutating, cleansing, etc. It does includes some null fields, duplications where a given crime may have had more than one victim, and some erroneous location data such as crimes occurring within a police station, outside of city limits, or within a different state, etc . Some of those characteristics, like duplications for more than one victim, are appropriate and left alone. Missing/Erroneous spatial data is dropped. You can see more detail and follow along via the appropriate file on Github. Most of the Demographic data is provided by the city of Chicago’s Department of Public Health as underlying pieces to their “Hardship Index”. It includes the following:
In order to keep the files robust, but manageable, I have created categories for each primary type of crime. These Meta Categories were created by grouping crime descriptions as closely to federal definitions as possible. Categories are as follows:
Note: The full map is viewable in Github. This analysis displays all 4 types in initial plots but is trimmed to just Violent Crimes and Non-Violent Property Crime for all mapping exercises.
Hour Categories are split into 4 hour increments as follows:
setwd("D:/Users/Jonathan Morken/Documents/GitHub/Capstone Project/")
crime_df_no_NAs <- read.csv("data/JoinedDataWithDist.csv", sep = ",", header = TRUE)
We need to cut extraneous columns and change the format of the following variables: * Make “Beat” a factor
* Make “District” a factor
* Add order to Hour.Category * Add order to Day of Week factor
#Remove unnecessary columns
crime_df_no_NAs <- select(crime_df_no_NAs, ID, Meta.Category, Primary.Type, Beat, District, Community.Name, PERCENT.OF.HOUSING.CROWDED, PERCENT.HOUSEHOLDS.BELOW.POVERTY, PERCENT.AGED.16..UNEMPLOYED, PERCENT.AGED.25..WITHOUT.HIGH.SCHOOL.DIPLOMA, HARDSHIP.INDEX, PER.CAPITA.INCOME, Total.2010, Year, Month, DayinMonth, DayofWeek, Hour, Hour.Category, Latitude.Crime, Longitude.Crime, Closest.District, PD.Dist.km, Transit.index, Transit.Dist.km)
#Fix column name formats for the sake of standardization.
crime_df_no_NAs <- rename(crime_df_no_NAs, Percent.of.Housing.Crowded = PERCENT.OF.HOUSING.CROWDED)
crime_df_no_NAs <- rename(crime_df_no_NAs, Percent.Households.Below.Poverty = PERCENT.HOUSEHOLDS.BELOW.POVERTY)
crime_df_no_NAs <- rename(crime_df_no_NAs, Percent.Aged.16.Unemployed = PERCENT.AGED.16..UNEMPLOYED)
crime_df_no_NAs <- rename(crime_df_no_NAs, Percent.Aged.25.Without.Hish.School.Diploma = PERCENT.AGED.25..WITHOUT.HIGH.SCHOOL.DIPLOMA)
crime_df_no_NAs <- rename(crime_df_no_NAs, Hardship.Index = HARDSHIP.INDEX)
crime_df_no_NAs <- rename(crime_df_no_NAs, Per.Capita.Income = PER.CAPITA.INCOME)
crime_df_no_NAs <- rename(crime_df_no_NAs, Census.Population = Total.2010)
#Change formats
crime_df_no_NAs$Beat <- as.factor(crime_df_no_NAs$Beat)
crime_df_no_NAs$District <- as.factor(crime_df_no_NAs$District)
#Make Hour Category an ordered factor and order them appropriately
print(levels(crime_df_no_NAs$Hour.Category))
## [1] "AM Commute" "Evening" "Lunch" "Night" "PM Commute"
## [6] "Pre Dawn"
crime_df_no_NAs$Hour.Category <- factor(crime_df_no_NAs$Hour.Category, levels(crime_df_no_NAs$Hour.Category)[ c(6,1,3,5,2,4)])
#Make DayofWeek an ordered factor
print(levels(crime_df_no_NAs$DayofWeek))
## [1] "Fri" "Mon" "Sat" "Sun" "Thurs" "Tues" "Wed"
crime_df_no_NAs$DayofWeek <- factor(crime_df_no_NAs$DayofWeek, levels(crime_df_no_NAs$DayofWeek)[ c(2,6,7,5,1,3,4)])
#Calculate Number of Years in data for use in annualized metrics.
Num_Years_in_Data <- as.integer(length(unique(crime_df_no_NAs$Year)))
#View data structure
str(crime_df_no_NAs)
## 'data.frame': 573280 obs. of 25 variables:
## $ ID : int 9936131 9923481 9913643 9912871 9909833 9909835 9909901 9910992 9909832 9934360 ...
## $ Meta.Category : Factor w/ 4 levels "Financial Crime/Fraud",..: 1 1 2 1 4 3 4 4 4 3 ...
## $ Primary.Type : Factor w/ 32 levels "ARSON","ASSAULT",..: 9 9 4 9 3 12 2 3 2 7 ...
## $ Beat : Factor w/ 274 levels "111","112","113",..: 273 100 190 83 165 179 61 10 157 265 ...
## $ District : Factor w/ 24 levels "1","2","3","4",..: 23 8 16 7 14 15 6 1 12 23 ...
## $ Community.Name : Factor w/ 77 levels "Albany Park",..: 33 4 58 24 76 6 5 49 43 9 ...
## $ Percent.of.Housing.Crowded : num 14.8 4 4.1 3.8 2.3 6.3 4 1.3 9.6 10.8 ...
## $ Percent.Households.Below.Poverty : num 33.9 10.4 11.6 46.6 14.7 28.6 27.6 13.8 25.8 18.7 ...
## $ Percent.Aged.16.Unemployed : num 17.3 11.7 12.6 28 6.6 22.6 28.3 4.9 15.8 14.6 ...
## $ Percent.Aged.25.Without.Hish.School.Diploma: num 35.4 17.7 19.3 28.5 12.9 24.4 18.5 7.4 40.7 37.3 ...
## $ Hardship.Index : int 85 37 35 94 10 73 74 7 76 70 ...
## $ Per.Capita.Income : int 13781 23482 24336 11888 43198 15957 15528 59077 16444 15461 ...
## $ Census.Population : int 56323 41081 64124 30654 82236 98514 48743 21390 35769 78743 ...
## $ Year : int 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 ...
## $ Month : int 12 12 12 12 12 12 12 12 12 12 ...
## $ DayinMonth : int 31 31 31 31 30 30 30 30 30 30 ...
## $ DayofWeek : Factor w/ 7 levels "Mon","Tues","Wed",..: 3 3 3 3 2 2 2 2 2 2 ...
## $ Hour : int 0 0 0 0 23 23 23 23 23 23 ...
## $ Hour.Category : Factor w/ 6 levels "Pre Dawn","AM Commute",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ Latitude.Crime : num 41.9 41.7 42 41.8 41.9 ...
## $ Longitude.Crime : num -87.7 -87.7 -87.8 -87.6 -87.7 ...
## $ Closest.District : Factor w/ 23 levels "1","10","11",..: 5 21 7 19 5 6 19 1 2 15 ...
## $ PD.Dist.km : num 3.24 3.53 2.64 1.01 2.16 ...
## $ Transit.index : int 244 207 167 19 103 82 44 300 101 213 ...
## $ Transit.Dist.km : num 2.25 5.01 2.28 1.49 1.19 ...
Contrary to our initial hypothesis, crime rates do not decrease as one gets closer to a given police department. While the effects are not extreme, crime rates actually appear to increase:
Note: The increasing crime density table and graph images can be found in the image folder in the Github project.
#Graph Meta.category vs. Distance to closest PD
g1 <- ggplot(data = crime_df_no_NAs,aes(x = PD.Dist.km)) +
scale_x_continuous() +
scale_y_continuous(labels=comma) +
geom_hline(yintercept=0, size=0.4, color="black") +
geom_histogram(breaks=seq(0, 5, by = 0.25), fill="#c0392b", alpha=0.75) +
scale_color_brewer(palette="Set1", name = "Meta.Category") +
facet_wrap(~Meta.Category) +
labs(title = paste('Crime Proximity to Police Departments'),x = "Kilometers", y = "Occurrences") +
theme_light()
#Code to extract figures behind plot for increasing crime density analysis. Note remove the facet wrap layer on the above for this first.
#head(ggplot_build(g1))
Some of the increase in the first ¼ kilometer may be due in part to incorrect location entries such as crimes occurring within police stations. However, the overall slope is meaningful. While area shrinks 98% from a 2km radius to a ¼ km, density increases by 67%. This would seem to support the alternative hypothesis that police presence does not have a dampening effect on crime, but that arrests actually increase with police patrols.
Here the initial hypothesis appears to be correct. Crime rate (density) clearly increases as one moves closer to a given transit station. Again, while area decreases 98% from a 2km radius to just a ¼ km, density increases by 985%. This effect starts to flatten out beyond 2 square kilometers.
Note: The increasing crime density table and graph images can be found in the image folder in the Github project.
#Meta.category and Transit Location
g2 <- ggplot(data = crime_df_no_NAs,aes(x = Transit.Dist.km)) +
scale_x_continuous() +
scale_y_continuous(labels=comma) +
geom_hline(yintercept=0, size=0.4, color="black") +
geom_histogram(breaks=seq(0, 5, by = 0.25), fill="lightblue", alpha=0.75) +
scale_color_brewer(palette="Set1", name = "Meta.Category") +
facet_wrap(~Meta.Category) +
labs(title = paste('Crime Proximity to Transit Stations'),x = "Kilometers", y = "Occurrences") +
theme_light()
#Code to extract figures behind plot for increasing crime density analysis. Note remove the facet wrap layer on the above for this first.
#head(ggplot_build(g2))
This could be further refined with transit station volume, but that data is not currently available. Regardless, it is clear that any future infrastructure investments must account for a city’s transit characteristics.
Note: This occurs later in the writeup.
Before we get to regression models, it may also be useful to examine some of the temporal characteristics of Chicago crime. First, it does not appear that crime substantially differs by day of week. Here are graphs showing all 4 categories.
Note: The lack of statistical significance was corroborated by linear models not shown here.
#Create grouped data set.
crime_by_dayofweek <- subset(crime_df_no_NAs,!is.na(DayofWeek)) %>%
group_by(Meta.Category,Year,Month,DayinMonth, DayofWeek) %>%
summarise(Count = n())
#Hour Category
d1 <- ggplot(data = crime_by_dayofweek,aes(x = DayofWeek, y = Count)) +
scale_y_continuous(labels=comma) +
geom_bar(stat = "identity", fill="lightgreen", alpha=0.75)+
scale_color_brewer(palette="Set1", name = "Meta.Category") +
facet_wrap(~Meta.Category) +
labs(title = paste('Crime by Day of Week'), x = "Day of Week", y = "Occurrences") +
theme_light()
#pdf('Images/d1.pdf')
d1
#dev.off()
The following boxplots display greater detail for the Violent Crimes and Non-Violent Property Crime categories. It is interesting to note that property crimes fall on Saturday and Sunday while violent crimes rise.
d2 <- ggplot(data = subset(crime_by_dayofweek,Meta.Category %in% c("Violent Crimes","Non-Violent Property Crime")),aes(x = DayofWeek, y = Count, colour = Meta.Category)) +
scale_y_continuous(labels=comma) +
geom_boxplot(aes_string(colour="Meta.Category", fill="Meta.Category"), alpha=0.75)+
scale_color_brewer(palette="Set1", name = "Meta.Category") +
facet_wrap(~Meta.Category) +
labs(title = paste('Crime by Day of Week'), x = "Day of Week", y = "Occurrences") +
theme_light() +
theme(legend.position = "none")
#pdf('Images/d2.pdf')
d2
#dev.off()
Unlike day of week, there are large variances in crime frequency between hour categories:
#Create grouped data set.
crime_by_hour_category <- subset(crime_df_no_NAs,!is.na(Hour.Category)) %>%
group_by(Meta.Category,Year,Month,DayinMonth,Hour.Category) %>%
summarise(Count = n())
#Hour Category
h1 <- ggplot(data = crime_by_hour_category,aes(x = Hour.Category, y = Count)) +
scale_y_continuous(labels=comma) +
geom_bar(stat = "identity", fill="lightgreen", alpha=0.75)+
coord_flip() +
scale_color_brewer(palette="Set1", name = "Meta.Category") +
facet_wrap(~Meta.Category) +
labs(title = paste('Daily Crime per Hour Category'), x = "Hour Category", y = "Occurrences") +
theme_light()
#pdf('Images/h1.pdf')
h1
#dev.off()
The following boxplots display greater detail for the Violent Crimes and Non-Violent Property Crime categories. Note how property crimes peak over lunch and the PM commute, while violent crimes are more common later in the day. This distribution is statistically significant and will be explored more in the regression models below.
h2 <- ggplot(data = subset(crime_by_hour_category,Meta.Category %in% c("Violent Crimes","Non-Violent Property Crime")),aes(x = Hour.Category, y = Count, colour = Meta.Category)) +
scale_y_continuous(labels=comma) +
geom_boxplot(aes_string(colour="Meta.Category", fill="Meta.Category"), alpha=0.75)+
scale_color_brewer(palette="Set1", name = "Meta.Category") +
facet_wrap(~Meta.Category) +
labs(title = paste('Daily Crime by Hour Category'), x = "Hour Category", y = "Occurrences") +
theme_light() +
theme(legend.position = "none")
#pdf('Images/h2.pdf')
h2
#dev.off()
From here on, We will move toward mapping the data.
library(rgdal)
library(geosphere)
Chicago_shp <- readOGR(dsn = "data/Boundaries - Community Areas (current)", layer = "geo_export_70e8c7f3-e054-4acc-b740-ee8916d5fa41")
## OGR data source with driver: ESRI Shapefile
## Source: "data/Boundaries - Community Areas (current)", layer: "geo_export_70e8c7f3-e054-4acc-b740-ee8916d5fa41"
## with 77 features
## It has 9 fields
class(Chicago_shp)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
#Add in Area for each community in km^2
Chicago_shp@data$Poly_Area <- areaPolygon(Chicago_shp)/1000000 #Division required to turn into square km
Chicago_shp@data <- rename(Chicago_shp@data, Community.Name = community)
Chicago_shp@data <- rename(Chicago_shp@data, Area.Number = area_numbe)
#Subset the data to just Violent Crime and Non-Violent Property Crime
crime_subset_df <- subset(crime_df_no_NAs,Meta.Category %in% c("Violent Crimes","Non-Violent Property Crime"))
#Create data frame by community
crime_by_community <- crime_subset_df %>%
group_by(Meta.Category,Community.Name) %>%
summarise(Count = n())
#Convert to uppercase to match shape file
crime_by_community$Community.Name = toupper(crime_by_community$Community.Name)
#Join shape area data to crime_by_community and create density variable. We later do this join in the opposite direction to show how to do the same thing but for mapping purposes. Only the second method would actually be required as you can access shape file data using the @ symbol.
crime_by_community <- left_join(crime_by_community,Chicago_shp@data, by = "Community.Name")
crime_by_community <- as.data.frame(crime_by_community)
crime_by_community <- select(crime_by_community, Meta.Category, Community.Name, Count, Poly_Area)
crime_by_community <- mutate(crime_by_community, Density = (Count / Poly_Area)/Num_Years_in_Data) #Division is to annualize figure
Note: This comes after the Choropleth in the writeup
As an addendum to the above map, the following bar graph shows the 20 communities with the highest density ratings. It is ordered by a community’s overall crime density and then split out to show Non-Violent Property Crime separate from Violent Crimes.
#Plot crime by community by density
Comms_to_show <- c("LOOP","NEAR NORTH SIDE","SOUTH SHORE","WEST GARFIELD PARK",
"WEST ENGLEWOOD","ENGLEWOOD","EAST GARFIELD PARK","NORTH LAWNDALE","CHATHAM","AUSTIN",
"LAKE VIEW","ROGERS PARK","AUBURN GRESHAM","GRAND BOULEVARD","WASHINGTON PARK",
"HUMBOLDT PARK","WOODLAWN","GREATER GRAND CROSSING","WEST TOWN","CHICAGO LAWN")
g3 <- ggplot(data = subset(crime_by_community,Community.Name %in% Comms_to_show),aes(reorder(Community.Name, Density), y = Density, fill = Density)) +
scale_y_continuous(labels=comma) +
scale_fill_gradient(low = "lightblue", high = "red", guide = "colourbar") +
geom_bar(stat = "identity", alpha=0.75)+
coord_flip() +
facet_wrap(~Meta.Category) +
labs(title = paste('Top 20 Annualized Community Crime Rates by km^2'), x = "Community", y = "Occurrences") + theme_light()
#pdf('Images/g3.pdf')
g3
#dev.off()
Note: Some communities (like the Loop and the Near North Side) are very dense but markedly one-sided in the make-up of their crime. Both of these communities are relatively affluent with per capita incomes of $65,526 and $88,669, respectively. Demographic based analysis might see their crime rates as anomalous; however, they have a very large number of accessible mass transit locations within their communities. As we will see later, this at least partially explains their very high crime rates.
| Community | Per Capita Income | Number of Closely Accessible Transit Stops |
|---|---|---|
| Loop | $65,526 | 10 |
| Near North Side | $88,669 | 10 |
| Median | $20,956 | 4 |
summary(Chicago_shp)
## Object of class SpatialPolygonsDataFrame
## Coordinates:
## min max
## x -87.94011 -87.52414
## y 41.64454 42.02304
## Is projected: FALSE
## proj4string : [+proj=longlat +ellps=WGS84 +no_defs]
## Data attributes:
## Area.Number comarea_ shape_area perimeter
## 1 : 1 Min. :0 Min. : 16913961 Min. :0
## 10 : 1 1st Qu.:0 1st Qu.: 49769639 1st Qu.:0
## 11 : 1 Median :0 Median : 79635753 Median :0
## 12 : 1 Mean :0 Mean : 83614533 Mean :0
## 13 : 1 3rd Qu.:0 3rd Qu.: 98853168 3rd Qu.:0
## 14 : 1 Max. :0 Max. :371835608 Max. :0
## (Other):71
## Community.Name comarea_id shape_len area_num_1
## ALBANY PARK : 1 Min. :0 Min. : 18138 1 : 1
## ARCHER HEIGHTS: 1 1st Qu.:0 1st Qu.: 31949 10 : 1
## ARMOUR SQUARE : 1 Median :0 Median : 43229 11 : 1
## ASHBURN : 1 Mean :0 Mean : 44398 12 : 1
## AUBURN GRESHAM: 1 3rd Qu.:0 3rd Qu.: 49478 13 : 1
## AUSTIN : 1 Max. :0 Max. :173626 14 : 1
## (Other) :71 (Other):71
## area Poly_Area
## Min. :0 Min. : 1.571
## 1st Qu.:0 1st Qu.: 4.624
## Median :0 Median : 7.398
## Mean :0 Mean : 7.768
## 3rd Qu.:0 3rd Qu.: 9.183
## Max. :0 Max. :34.545
##
bbox(Chicago_shp)
## min max
## x -87.94011 -87.52414
## y 41.64454 42.02304
#Limit the data to max community boundaries
crime_df_no_NAs <- subset(crime_df_no_NAs,
-87.94011 <= Longitude.Crime & Longitude.Crime <= -87.52414 &
41.64454 <= Latitude.Crime & Latitude.Crime <= 42.02304)
#Subset the data again for standard deviation ellipse shapes below.
violent_crime_subset <- subset(crime_df_no_NAs,Meta.Category %in% c("Violent Crimes"))
prop_crime_subset <- subset(crime_df_no_NAs,Meta.Category %in% c("Non-Violent Property Crime"))
library(sp)
# Convert to SpatialPointsDataFrame with longitude and latitude
#Crime data - all
coords <- SpatialPoints(crime_subset_df[, c("Longitude.Crime","Latitude.Crime")])
crime_spatial_df <- SpatialPointsDataFrame(coords, crime_subset_df)
proj4string(crime_spatial_df) <- CRS("+proj=longlat +ellps=WGS84")
#Crime data - for violent crime ellipse below
violent_coords <- SpatialPoints(violent_crime_subset[, c("Longitude.Crime","Latitude.Crime")])
violent_crime_spatial_df <- SpatialPointsDataFrame(violent_coords, violent_crime_subset)
proj4string(violent_crime_spatial_df) <- CRS("+proj=longlat +ellps=WGS84")
#Crime data - for property crime ellipse below
prop_coords <- SpatialPoints(prop_crime_subset[, c("Longitude.Crime","Latitude.Crime")])
prop_crime_spatial_df <- SpatialPointsDataFrame(prop_coords, prop_crime_subset)
proj4string(prop_crime_spatial_df) <- CRS("+proj=longlat +ellps=WGS84")
#Pull in Mass Transit and Police Department Locations
Mass_Transit <- read.csv("Data/CTAListofLStopsWithPDDistance.csv", sep = ",", header = TRUE)
Mass_Transit <- Mass_Transit[,c(1,2,3,4,5,6,7,9,12)]
#Transit_Coord <- Mass_Transit
Police_Dps <- read.csv("Data/PoliceStations.csv", sep = ",", header = TRUE)
Police_Dps <- Police_Dps[,c(1,6,7)]
#Create Ellipse Data frame
df_ellipse <- data.frame(prop_crime_spatial_df@coords, group="Non-Violent Property Crime")
df_ellipse <- rbind(df_ellipse, data.frame(violent_crime_spatial_df@coords, group="Violent Crimes"))
#Create points for Ellipses
library(ellipse)
df_ell <- data.frame()
for(g in levels(df_ellipse$group)){
df_ell <- rbind(df_ell, cbind(as.data.frame(with(df_ellipse[df_ellipse$group==g,], ellipse(cor(Longitude.Crime, Latitude.Crime),
scale=c(sd(Longitude.Crime),sd(Latitude.Crime)),
centre=c(mean(Longitude.Crime),mean(Latitude.Crime)),level = 0.68))),group=g))
}
ggplot2 works with data frames and not objects of class Spatial*. So we have to convert the shapes file for those in particular using the fortify() function from ggplot2.
Chicago_shp_df <- fortify(Chicago_shp)
Create background layer Note: The ideal zoom setting would be 10.5, but that is not possible.
## Use "get_map" command to download the images and format them for plotting
map_dat <- get_googlemap(center = c(lon = -87.723900, lat = 41.839530), zoom = 10,
maptype = "road", language = "en-EN", sensor = FALSE, messaging = FALSE,
urlonly = FALSE)
Use “ggmap” command to make the background plot
ChicagoMap <- ggmap(map_dat, extent = "device", legend = "topleft")
ChicagoMap
Draw 68% confidence interval on maps for both Violent and Property Crimes.
CMap_ellipse_all <- ChicagoMap +
geom_point(data = df_ellipse,aes(x = Longitude.Crime, y = Latitude.Crime, colour = group),alpha = .007) +
geom_path(data=df_ell, aes(x=x, y=y), size=1, col = "white", linetype=2)+
facet_wrap(~group)+
theme(panel.margin = unit(1, "lines"))
#pdf('Images/ellipses.pdf')
CMap_ellipse_all +guides(colour=FALSE)
#dev.off()
Zoom Map in a little for heatmaps.
map_dat <- get_googlemap(center = c(lon = -87.723900, lat = 41.839530), zoom = 11,
maptype = "road", language = "en-EN", sensor = FALSE, messaging = FALSE,
urlonly = FALSE)
ChicagoMap <- ggmap(map_dat, extent = "device", legend = "topleft")
#Add Shapefile
ChicagoMap2 <- ChicagoMap +
geom_polygon(aes(x = long, y = lat, group = group), data = Chicago_shp_df, colour = "black", fill = NA, size = .2)
ChicagoHeatMap <- ChicagoMap2 +
stat_density2d(aes(x = Longitude.Crime,
y = Latitude.Crime,
fill = ..level..,
alpha = ..level..),
na.rm = TRUE,
size = .05,
bins = 12,
data = crime_subset_df,
geom = "polygon",
colour = "grey") +
## Configure the scale and panel
scale_fill_continuous(low = "red", high = "darkred", guide = "colourbar") +
scale_alpha(range = c(.1,.6)) +
## Title and labels
labs(x = "Longitude", y = "Latitude") +
ggtitle(paste("Crimes in/around Chicago")) +
## Other theme settings
theme(
plot.title = element_text(size = 26, face = 'bold', vjust = 2),
axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank(),
legend.position = "none",
strip.background = element_rect(fill = 'grey80'),
strip.text = element_text(size = 18)
)
The following is an overall image of Non-violent Property Crime and Violent Crimes around Chicago. Property crime is highly focused downtown with a few moderate hot spots elsewhere. On the other hand, violent crime is widely dispersed with multiple hot zones.
#pdf('Images/heatmaps.pdf')
ChicagoHeatMap +
facet_wrap(~Meta.Category)+
theme(panel.margin = unit(1, "lines"))+
guides(colour=FALSE)
#dev.off()
First, create circle function for radius drawings
# create circles data frame from the PD data frame
make_circles <- function(pointlist, radius, nPoints = 100){
# radius: radius is measured in kilometer
meanLat <- mean(pointlist$Latitude)
# length per longitude changes with Latitude, so need correction.
# Also, note that 1 degree of latitude is the same as 111 kilometers.
radiusLon <- radius /111 / cos(meanLat/57.3)
radiusLat <- radius / 111
circleDF <- data.frame(ID = rep(pointlist[,c(1)], each = nPoints))
angle <- seq(0,2*pi,length.out = nPoints)
circleDF$lon <- unlist(lapply(pointlist$Longitude, function(x) x + radiusLon * cos(angle)))
circleDF$lat <- unlist(lapply(pointlist$Latitude, function(x) x + radiusLat * sin(angle)))
return(circleDF)
}
# Create data frame for circles around PDs
PDCircles <- make_circles(Police_Dps, 2)
If our null hypothesis regarding proximity to police departments is correct, we should see dips in a localized heat map around police markers. The following heat map shows police departments as markers with 2km radius zones drawn around them. Close inspection reveals that PDs are typically not located at peaks. However, they also don’t exhibit any strong reductive effects on crime rates in their immediate, surrounding areas. This map isn’t detailed enough to reject the first piece of our null hypothesis, but it doesn’t look promising.
Note: We used a 2 km radius assuming that any proximity effects would be de minimis beyond that range (if there are any at all).
Graph:
ChicagoHeatMapPD <- ChicagoHeatMap +
geom_point(aes(x = Longitude, y = Latitude), data = Police_Dps, color="blue", size = 3, alpha = .5) +
geom_polygon(data = PDCircles, aes(lon, lat, group = ID), color = "blue", alpha = .1, fill = "blue")
#pdf('Images/heatmapwithPD.pdf')
ChicagoHeatMapPD #+
# facet_wrap(~Meta.Category)+
# theme(panel.margin = unit(1, "lines"))+
# guides(colour=FALSE)
#dev.off()
A transit station overlay (as represented by the L) shows a great deal of overlap between common transportation avenues and crime rates. Again, this is not enough to confirm the hypothetical relationship between transit proximity and spatial crime rates, but it does provide a promising initial picture.
Note: Although we do not have the data to confirm it, we suspect that this relationship would only be reinforced by adding bus routes:
ChicagoHeatMapTrans <- ChicagoHeatMap +
geom_point(aes(x = Longitude, y = Latitude), data = Mass_Transit, color="darkgreen", size = 3, alpha = .5)
#pdf('Images/heatmapwithTrans.pdf')
ChicagoHeatMapTrans #+
# facet_wrap(~Meta.Category)+
# theme(panel.margin = unit(1, "lines"))+
# guides(colour=FALSE)
#dev.off()
#Add density to Chicago_shp file
#Create data frame by community
crime_by_community2 <- crime_subset_df %>%
group_by(Community.Name) %>%
summarise(Count = n(),
Num.Close.Transit = n_distinct(Transit.index),
Percent.of.Housing.Crowded = median(Percent.of.Housing.Crowded),
Percent.Households.Below.Poverty = median(Percent.Households.Below.Poverty),
Percent.Aged.16.Unemployed = median(Percent.Aged.16.Unemployed),
Percent.Aged.25.Without.Hish.School.Diploma = median(Percent.Aged.25.Without.Hish.School.Diploma),
Census.Population = median(Census.Population),
Hardship.Index = median(Hardship.Index),
Per.Capita.Income = median(Per.Capita.Income))
#Convert to uppercase to match shape file
crime_by_community2$Community.Name = toupper(crime_by_community2$Community.Name)
#Join shape area data to crime_by_community and create density variable
Chicago_shp@data <- left_join(Chicago_shp@data, crime_by_community2, by = "Community.Name")
Chicago_shp@data <- mutate(Chicago_shp@data, Density = (round((Count / Poly_Area), 2))/Num_Years_in_Data)
Chicago_shp@data <- mutate(Chicago_shp@data, People.Per.km = (round((Census.Population / Poly_Area), 2)))
#Create slimmer version for choropleth:
choro <- Chicago_shp[,c("Area.Number","Community.Name","Density","Num.Close.Transit","People.Per.km","Percent.Households.Below.Poverty","Percent.Aged.16.Unemployed")]
So far, we have demonstrated that crime rate (density) increases as one moves closer to both police stations and transit locations, but given the structure and political influence of community organizations, it may prove equally useful to view the problem and any resource deployment recommendations bounded by community borders. The following choropleth displays Chicago communities according to increasing crime “density” as measured by annualized crime rate per square kilometer.
Note: This is an image of the interactive html file available via the project’s Github folder.
#Commented out here as it creates a separate html
#library(plotGoogleMaps)
#plotGoogleMaps(choro,zcol="Density",filename="Crimes_Communities.html",layerName="Density of Crimes", fillOpacity=0.4,strokeWeight=0,mapTypeId="ROADMAP")
We chose a number of community level demographics to explore for our final regression models. To summarize, we found that only 4 of the chosen factors were significantly correlated to a community’s crime rate (density). See the table and graphical correlation matrix below:
Note: We will use the number of transit stations accessible to a given community as a proxy for individual crimes’ proximity figures in our regression models. Also, note the co-linearity between “% Households Below the Poverty Line” vs. “% Aged 16 Unemployed” (corr = 0.81) and “#of Transit Stations vs. “Population Density” (corr = 0.48). This will dictate choosing one of each pair.
Note: The density correlation table can be found in the image folder in the Github project.
##
## Pearson's product-moment correlation
##
## data: Chicago_shp@data$Density and Chicago_shp@data$Num.Close.Transit
## t = 6.323, df = 72, p-value = 1.907e-08
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4273692 0.7267872
## sample estimates:
## cor
## 0.5975183
##
## Pearson's product-moment correlation
##
## data: Chicago_shp@data$Density and Chicago_shp@data$Hardship.Index
## t = 1.0346, df = 72, p-value = 0.3043
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.1105222 0.3401254
## sample estimates:
## cor
## 0.1210331
##
## Pearson's product-moment correlation
##
## data: Chicago_shp@data$Density and Chicago_shp@data$Per.Capita.Income
## t = 1.4717, df = 72, p-value = 0.1455
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.0599510 0.3843783
## sample estimates:
## cor
## 0.1708887
##
## Pearson's product-moment correlation
##
## data: Chicago_shp@data$Density and Chicago_shp@data$Percent.of.Housing.Crowded
## t = 0.35381, df = 72, p-value = 0.7245
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.1886336 0.2676121
## sample estimates:
## cor
## 0.04166083
##
## Pearson's product-moment correlation
##
## data: Chicago_shp@data$Density and Chicago_shp@data$Percent.Households.Below.Poverty
## t = 3.5473, df = 72, p-value = 0.0006893
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1723994 0.5644557
## sample estimates:
## cor
## 0.3857041
##
## Pearson's product-moment correlation
##
## data: Chicago_shp@data$Density and Chicago_shp@data$Percent.Aged.16.Unemployed
## t = 2.0115, df = 72, p-value = 0.04801
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.002289403 0.436176558
## sample estimates:
## cor
## 0.2306675
##
## Pearson's product-moment correlation
##
## data: Chicago_shp@data$Density and Chicago_shp@data$Percent.Aged.25.Without.Hish.School.Diploma
## t = -0.63253, df = 72, p-value = 0.529
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2977788 0.1568244
## sample estimates:
## cor
## -0.07433817
##
## Pearson's product-moment correlation
##
## data: Chicago_shp@data$Density and Chicago_shp@data$People.Per.km
## t = 4.3812, df = 72, p-value = 3.939e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2572555 0.6220714
## sample estimates:
## cor
## 0.4587857
## List of 44
## $ line :List of 4
## ..$ colour : chr "black"
## ..$ size : num 0.5
## ..$ linetype: num 1
## ..$ lineend : chr "butt"
## ..- attr(*, "class")= chr [1:2] "element_line" "element"
## $ rect :List of 4
## ..$ fill : chr "white"
## ..$ colour : chr "black"
## ..$ size : num 0.5
## ..$ linetype: num 1
## ..- attr(*, "class")= chr [1:2] "element_rect" "element"
## $ text :List of 10
## ..$ family : chr ""
## ..$ face : chr "plain"
## ..$ colour : chr "black"
## ..$ size : num 12
## ..$ hjust : num 0.5
## ..$ vjust : num 0.5
## ..$ angle : num 0
## ..$ lineheight: num 0.9
## ..$ margin :Classes 'margin', 'unit' atomic [1:4] 0 0 0 0
## .. .. ..- attr(*, "unit")= chr "pt"
## .. .. ..- attr(*, "valid.unit")= int 8
## ..$ debug : logi FALSE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.line :List of 4
## ..$ colour : NULL
## ..$ size : NULL
## ..$ linetype: NULL
## ..$ lineend : NULL
## ..- attr(*, "class")= chr [1:2] "element_line" "element"
## $ axis.line.x : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ axis.line.y : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ axis.text :List of 10
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : chr "grey30"
## ..$ size :Class 'rel' num 0.8
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight: NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text.x :List of 10
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 1
## ..$ angle : NULL
## ..$ lineheight: NULL
## ..$ margin :Classes 'margin', 'unit' atomic [1:4] 2.4 0 0 0
## .. .. ..- attr(*, "unit")= chr "pt"
## .. .. ..- attr(*, "valid.unit")= int 8
## ..$ debug : NULL
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text.y :List of 10
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : num 1
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight: NULL
## ..$ margin :Classes 'margin', 'unit' atomic [1:4] 0 2.4 0 0
## .. .. ..- attr(*, "unit")= chr "pt"
## .. .. ..- attr(*, "valid.unit")= int 8
## ..$ debug : NULL
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.ticks :List of 4
## ..$ colour : chr "grey70"
## ..$ size : num 0.25
## ..$ linetype: NULL
## ..$ lineend : NULL
## ..- attr(*, "class")= chr [1:2] "element_line" "element"
## $ axis.ticks.length :Class 'unit' atomic [1:1] 3
## .. ..- attr(*, "unit")= chr "pt"
## .. ..- attr(*, "valid.unit")= int 8
## $ axis.title.x :List of 10
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight: NULL
## ..$ margin :Classes 'margin', 'unit' atomic [1:4] 4.8 0 2.4 0
## .. .. ..- attr(*, "unit")= chr "pt"
## .. .. ..- attr(*, "valid.unit")= int 8
## ..$ debug : NULL
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.title.y :List of 10
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : num 90
## ..$ lineheight: NULL
## ..$ margin :Classes 'margin', 'unit' atomic [1:4] 0 4.8 0 2.4
## .. .. ..- attr(*, "unit")= chr "pt"
## .. .. ..- attr(*, "valid.unit")= int 8
## ..$ debug : NULL
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ legend.background :List of 4
## ..$ fill : NULL
## ..$ colour : logi NA
## ..$ size : NULL
## ..$ linetype: NULL
## ..- attr(*, "class")= chr [1:2] "element_rect" "element"
## $ legend.margin :Class 'unit' atomic [1:1] 0.2
## .. ..- attr(*, "unit")= chr "cm"
## .. ..- attr(*, "valid.unit")= int 1
## $ legend.key :List of 4
## ..$ fill : chr "white"
## ..$ colour : chr "grey50"
## ..$ size : num 0.25
## ..$ linetype: NULL
## ..- attr(*, "class")= chr [1:2] "element_rect" "element"
## $ legend.key.size :Class 'unit' atomic [1:1] 1.2
## .. ..- attr(*, "unit")= chr "lines"
## .. ..- attr(*, "valid.unit")= int 3
## $ legend.key.height : NULL
## $ legend.key.width : NULL
## $ legend.text :List of 10
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size :Class 'rel' num 0.8
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight: NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ legend.text.align : NULL
## $ legend.title :List of 10
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : num 0
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight: NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ legend.title.align : NULL
## $ legend.position : chr "right"
## $ legend.direction : NULL
## $ legend.justification : chr "center"
## $ legend.box : NULL
## $ panel.background :List of 4
## ..$ fill : chr "white"
## ..$ colour : logi NA
## ..$ size : NULL
## ..$ linetype: NULL
## ..- attr(*, "class")= chr [1:2] "element_rect" "element"
## $ panel.border :List of 4
## ..$ fill : logi NA
## ..$ colour : chr "grey70"
## ..$ size : num 0.5
## ..$ linetype: NULL
## ..- attr(*, "class")= chr [1:2] "element_rect" "element"
## $ panel.grid.major :List of 4
## ..$ colour : chr "grey85"
## ..$ size : num 0.25
## ..$ linetype: NULL
## ..$ lineend : NULL
## ..- attr(*, "class")= chr [1:2] "element_line" "element"
## $ panel.grid.minor :List of 4
## ..$ colour : chr "grey93"
## ..$ size : num 0.125
## ..$ linetype: NULL
## ..$ lineend : NULL
## ..- attr(*, "class")= chr [1:2] "element_line" "element"
## $ panel.margin :Class 'unit' atomic [1:1] 6
## .. ..- attr(*, "unit")= chr "pt"
## .. ..- attr(*, "valid.unit")= int 8
## $ panel.margin.x : NULL
## $ panel.margin.y : NULL
## $ panel.ontop : logi FALSE
## $ strip.background :List of 4
## ..$ fill : chr "grey70"
## ..$ colour : logi NA
## ..$ size : NULL
## ..$ linetype: NULL
## ..- attr(*, "class")= chr [1:2] "element_rect" "element"
## $ strip.text :List of 10
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : chr "grey10"
## ..$ size :Class 'rel' num 0.8
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight: NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ strip.text.x :List of 10
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : chr "white"
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight: NULL
## ..$ margin :Classes 'margin', 'unit' atomic [1:4] 6 0 6 0
## .. .. ..- attr(*, "unit")= chr "pt"
## .. .. ..- attr(*, "valid.unit")= int 8
## ..$ debug : NULL
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ strip.text.y :List of 10
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : chr "white"
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : num -90
## ..$ lineheight: NULL
## ..$ margin :Classes 'margin', 'unit' atomic [1:4] 0 6 0 6
## .. .. ..- attr(*, "unit")= chr "pt"
## .. .. ..- attr(*, "valid.unit")= int 8
## ..$ debug : NULL
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ strip.switch.pad.grid:Class 'unit' atomic [1:1] 0.1
## .. ..- attr(*, "unit")= chr "cm"
## .. ..- attr(*, "valid.unit")= int 1
## $ strip.switch.pad.wrap:Class 'unit' atomic [1:1] 0.1
## .. ..- attr(*, "unit")= chr "cm"
## .. ..- attr(*, "valid.unit")= int 1
## $ plot.background :List of 4
## ..$ fill : NULL
## ..$ colour : chr "white"
## ..$ size : NULL
## ..$ linetype: NULL
## ..- attr(*, "class")= chr [1:2] "element_rect" "element"
## $ plot.title :List of 10
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size :Class 'rel' num 1.2
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight: NULL
## ..$ margin :Classes 'margin', 'unit' atomic [1:4] 0 0 7.2 0
## .. .. ..- attr(*, "unit")= chr "pt"
## .. .. ..- attr(*, "valid.unit")= int 8
## ..$ debug : NULL
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ plot.margin :Classes 'margin', 'unit' atomic [1:4] 6 6 6 6
## .. ..- attr(*, "unit")= chr "pt"
## .. ..- attr(*, "valid.unit")= int 8
## - attr(*, "class")= chr [1:2] "theme" "gg"
## - attr(*, "complete")= logi TRUE
## - attr(*, "validate")= logi TRUE
Given each of the above factors, we chose to model the linear relationships between a given communities overall crime count and crime density in separate linear models against those factors from the above analysis that seem most significant taking co-linearity into account. These are Hour Category, the # of Transit Stations in Close Proximity to a Given Community, % Households Below the Poverty Level, and Minimum Distance to a transit Station (as a proxy for access to any public transportation).
A model based on these few factors can explain nearly 50% of the monthly variance in community crime counts. Note: Minimum Distance is not a significant factor in this model.
##
## Call:
## lm(formula = Count ~ Hour.Category + Num.Close.Transit + Percent.Households.Below.Poverty +
## Minimum.Transit.Dist, data = crime_by_community3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -755.23 -161.67 -56.18 111.33 1403.43
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -162.091 32.580 -4.975 7.84e-07 ***
## Hour.CategoryAM Commute 168.201 31.376 5.361 1.06e-07 ***
## Hour.CategoryLunch 291.159 31.375 9.280 < 2e-16 ***
## Hour.CategoryPM Commute 348.592 31.377 11.110 < 2e-16 ***
## Hour.CategoryEvening 292.607 31.375 9.326 < 2e-16 ***
## Hour.CategoryNight 163.950 31.375 5.225 2.17e-07 ***
## Num.Close.Transit 24.015 1.413 16.990 < 2e-16 ***
## Percent.Households.Below.Poverty 7.131 0.785 9.084 < 2e-16 ***
## Minimum.Transit.Dist -7.464 5.812 -1.284 0.199
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 269.9 on 879 degrees of freedom
## (55 observations deleted due to missingness)
## Multiple R-squared: 0.4136, Adjusted R-squared: 0.4083
## F-statistic: 77.5 on 8 and 879 DF, p-value: < 2.2e-16
I.e. within a given year and hour category, the above variables can explain 41% of the crime count variance by community.
Another model based on the same factors can explain 51% of the annual variance in community crime density.
##
## Call:
## lm(formula = Density ~ Hour.Category + Num.Close.Transit + Percent.Households.Below.Poverty +
## Minimum.Transit.Dist, data = crime_by_community4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -68.22 -16.76 -1.79 13.92 229.05
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -21.06111 3.65400 -5.764 1.14e-08 ***
## Hour.CategoryAM Commute 24.42560 3.51890 6.941 7.55e-12 ***
## Hour.CategoryLunch 42.76953 3.51888 12.154 < 2e-16 ***
## Hour.CategoryPM Commute 50.40680 3.51899 14.324 < 2e-16 ***
## Hour.CategoryEvening 41.45542 3.51885 11.781 < 2e-16 ***
## Hour.CategoryNight 22.28764 3.51884 6.334 3.81e-10 ***
## Num.Close.Transit 2.77874 0.15852 17.529 < 2e-16 ***
## Percent.Households.Below.Poverty 1.19176 0.08804 13.536 < 2e-16 ***
## Minimum.Transit.Dist -2.81023 0.65185 -4.311 1.81e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 30.27 on 879 degrees of freedom
## (55 observations deleted due to missingness)
## Multiple R-squared: 0.5165, Adjusted R-squared: 0.5121
## F-statistic: 117.4 on 8 and 879 DF, p-value: < 2.2e-16
I did the following test to see if transit stations closer to PD’s had lower crime rates. They do not.
#Create data frame by Station Name
Mass_Transit_Closest_PD <- subset(Mass_Transit,PD.Dist.km < 2) %>%
group_by(STATION_NAME, Transit.index, STOP_NAME) %>%
summarise(Closest.PD.Dist = median(PD.Dist.km))
Mass_Transit_Closest_PD <- rename(Mass_Transit_Closest_PD, Station.Name = STATION_NAME)
km_to_Transit_Crimes <- subset(crime_subset_df, Transit.Dist.km < 2) %>%
group_by(Transit.index) %>%
summarise(Count = n())
#Join Station Proximity Data
Mass_Transit_Closest_PD <- left_join(Mass_Transit_Closest_PD, km_to_Transit_Crimes, by = "Transit.index")
Mass_Transit_Closest_PD$Station.Name <- as.factor(Mass_Transit_Closest_PD$Station.Name)
# Transit Proximity to PD Effects on Crime less Loop outliers.
g12 <- ggplot(data = Mass_Transit_Closest_PD, aes(y = Count, x = Closest.PD.Dist, color = Count)) +
geom_point(shape = 19, size = 4) +
scale_color_gradient(low = "lightblue", high = "red") +
scale_y_continuous(labels=comma) +
scale_x_continuous(labels=comma) +
geom_smooth(method = "lm") +
labs(title = paste('Crime frequency in 1.75 km Radius Around Transit Station vs Station Distance from Police Department'),x = "Distance to Closest PD", y = "Count of Crime") +
theme_light()
#pdf('Images/g6.pdf')
g12
#dev.off()
cor.test(Mass_Transit_Closest_PD$Count,Mass_Transit_Closest_PD$Closest.PD.Dist)
##
## Pearson's product-moment correlation
##
## data: Mass_Transit_Closest_PD$Count and Mass_Transit_Closest_PD$Closest.PD.Dist
## t = -1.4982, df = 161, p-value = 0.136
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2661814 0.0371309
## sample estimates:
## cor
## -0.1172589
Based on this analysis, it would appear that crime density actually increases as one moves closer to both police departments and transit stations. With regard to police departments, this bias may simply reflect that police are more likely to make an arrest for a crime where they most regularly patrol.
While it may not have the hypothesized benefit of reducing crime rates, placing police departments near transit stations will still give police the best chance to affect crime in their areas. As such, the city may still be best served by building small, permanent outposts at its most affected transit stations as opposed to a single, large department. This strategy could reduce costs while having a targeted effect on demonstrated hot spots. Regular assignments to these posts could have the added benefit of improving community relations and participation through personal familiarity. I would recommend running a few trials in the densest communities to determine effect. Per the choropleth and bar graphs in the community section of this paper, I would recommend running separate trials in West and East Garfield for Violent Crimes vs. the Loop and the Near North Side for Property Crimes. Trials should help to isolate divergent effects on Violent Crimes vs. Non-Violent Property Crime.
As has been reported in recent (spring of 2015) news, Chicago is “off to (its) deadliest start in nearly two decades.” Homicides and shootings are up by more than 70% year over year through the first quarter, while arrests and investigative stops have fallen (follow the link for the source). The reasons for those shifts are beyond the scope of this analysis. However, a time lapse analysis could shed some light on growth patterns. If the client wishes to move forward, I would suggest the following:
Other than relatively weak, weekend effects, patrol schedules do not need to adjust for day of week volumes. There are, however, strong shifts in hourly volume and staff should be scheduled accordingly.
As a follow up to this analysis, I would suggest the following: